Dirichlet draws are sparse with high probability

نویسنده

Matus Telgarsky

چکیده

This note provides an elementary proof of the folklore fact that draws from a Dirichlet distribution (with parameters less than 1) are typically sparse (most coordinates are small). 1 Bounds Let Dir(α) denote a Dirichlet distribution with all parameters equal to α. Theorem 1.1. Suppose n ≥ 2 and (X1, . . . , Xn) ∼ Dir(1/n). Then, for any c0 ≥ 1 satisfying 6c0 ln(n) + 1 < 3n, Pr [∣∣∣∣{i : Xi ≥ 1 nc0 }∣∣∣∣ ≤ 6c0 ln(n)] ≥ 1− 1 nc0 . The parameter is taken to be 1/n, which is standard in machine learning. The above theorem states that (with high probability) as the exponent on the sparsity threshold grows linearly (n−1, n−2, n−3, . . .), the number of coordinates above the threshold cannot grow faster than linearly (6 ln(n), 12 ln(n), 18 ln(n), . . .). The above statement can be parameterized slightly more finely, exposing more tradeoffs than just the threshold and number of coordinates. Theorem 1.2. Suppose n ≥ 1 and c1, c2, c3 > 0 with c2 ln(n) + 1 < 3n, and (X1, . . . , Xn) ∼ Dir(c1/n); then Pr [ |{i : Xi ≥ n−c3}| ≤ c2 ln(n) ] ≥ 1− 1 e1/3 ( 1 n ) c2 3 −c1c3 − 1 e4/9 ( 1 n ) 4c2 9 . The natural question is whether the factor ln(n) is an artifact of the analysis; simulation experiments with Dirichlet parameter α = 1/n, summarized in Figure 1a, exhibit both the ln(n) term, and the linear relationship between sparsity threshold and number of coordinates exceeding it. The techniques here are loose when applied to the case α = o(1/n). In particular, Figure 1b suggests α = 1/n leads to a single nonsmall coordinate with high probability, which is stronger than what is captured by the following theorem. Theorem 1.3. Suppose n ≥ 3 and (X1, . . . , Xn) ∼ Dir(1/n); then Pr [ |{i : Xi ≥ n−2}| ≤ 5 ] ≥ 1− e2/e−2 − e−8/3 ≥ 0.64. Moreover, for any function g : Z++ → R++ and any n satisfying 1 ≤ ln(g(n)) < 3n− 1, Pr [ |{i : Xi ≥ n−2}| ≤ ln(g(n)) ] ≥ 1− e2/e−1/3 ( 1 g(n) )1/3 − e−4/9 ( 1 g(n) )4/9 . (Take for instance g to be the inverse Ackermann function.) 1 ar X iv :1 30 1. 49 17 v1 [ cs .L G ] 2 1 Ja n 20 13

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Introducing of Dirichlet process prior in the Nonparametric Bayesian models frame work

Statistical models are utilized to learn about the mechanism that the data are generating from it. Often it is assumed that the random variables y_i,i=1,…,n ,are samples from the probability distribution F which is belong to a parametric distributions class. However, in practice, a parametric model may be inappropriate to describe the data. In this settings, the parametric assumption could be r...

متن کامل

On the Role of Total Variation in Compressed Sensing

This paper considers the problem of recovering a one or two dimensional discrete signal which is approximately sparse in its gradient from an incomplete subset of its Fourier coefficients which have been corrupted with noise. We prove that in order to obtain a reconstruction which is robust to noise and stable to inexact gradient sparsity of order s with high probability, it suffices to draw O ...

متن کامل

A hierarchical latent topic model based on sparse coding

We propose a novel hierarchical latent topic model based on sparse coding in this paper. Unlike the other topic models applied in the computer vision field, the words in our model are not discrete but continuous. They are generated by sparse coding and represented with n-dimensional vectors in R. In sparse coding, only a small set of components of each word is active, so we assume the probabili...

متن کامل

Finding a Minimally Informative Dirichlet Prior Using Least Squares

Abstract In a Bayesian framework, the Dirichlet distribution is the conjugate distribution to the multinomial likelihood function, and so the analyst is required to develop a Dirichlet prior that incorporates available information. However, as it is a multiparameter distribution, choosing the Dirichlet parameters is less straightforward than choosing a prior distribution for a single parameter,...

متن کامل

Bayesian shrinkage

Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimension...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1301.4917 شماره

صفحات -

تاریخ انتشار 2013

Dirichlet draws are sparse with high probability

نویسنده

چکیده

منابع مشابه

Introducing of Dirichlet process prior in the Nonparametric Bayesian models frame work

On the Role of Total Variation in Compressed Sensing

A hierarchical latent topic model based on sparse coding

Finding a Minimally Informative Dirichlet Prior Using Least Squares

Bayesian shrinkage

عنوان ژورنال:

اشتراک گذاری